Let's play a coin game!
$1000$ people toss a coin $100$ times, and count heads.
import numpy as np
toss = np.random.choice([0, 1], size=[1000, 100]) # toss
toss_sums = np.sum(toss, axis=1) # count heads
import seaborn as sns
sns.histplot(toss_sums, discrete=True); # plot histogram
Histograms allows us to analyze data distribution graphically.
Playing with unfair coin
Our unfair coin yields heads with a probability $52\%$.
toss2 = np.random.choice([0, 1], size=[1000, 100], p=[.48, .52])
toss2_sums = np.sum(toss2, axis=1)
sns.histplot(toss_sums, discrete=True, label='0.5'); # fair
sns.histplot(toss2_sums, discrete=True, label='0.52'); # unfair
plt.legend();
Is the orange histogram same as the blue one?
We can test it using statistics.
$\chi^2$ test compares the shapes of two histograms, $h$ and $g$.
$$X^2=\sum_{i=1}^{N}\frac{(h_i - g_i)^2}{h_i}$$
# histograms
h = np.histogram(toss_sums, bins=range(100+1))[0]
g = np.histogram(toss2_sums, bins=range(100+1))[0]
# test statistics
test_stat = 0
for i in range(100):
if h[i] > 0:
test_stat += (h[i] - g[i])**2 / h[i]
test_stat # test statistic
197.7241088701197
We convert the test statistic to p-value (using $\chi^2$ distribution).
scipy.stats.chi2.sf(test_stat, 100-1) # significant if <0.05
1.4826819136900469e-08
The histograms are different. Something's wrong with the coin!
Enough of coins, back to images!
# load image
x = np.array(Image.open('nockspitze.png').convert('L'))
plt.imshow(x,cmap = "gray");
sns.histplot(x.flatten(), discrete=True); # cover histogram
sns.histplot(lsbr(x, 1.).flatten(), discrete=True); # stego histogram
Let's take a closer look. What is going on with the histogram?
fig, ax = plt.subplots(1, 3, sharey=True)
for i, alpha in enumerate([.0, .5, 1.]):
y = lsbr(x, alpha).flatten()
sns.histplot(y, binrange=(125, 175), discrete=True, ax=ax[i]);
ax[i].set_title(f'{alpha:.1f}');
LSBr averages the neighbor pairs (even and odd neighbor).
$$\bar{h}_i=\frac{h_{i}+h_{i+1}}{2}$$
# histogram
h, edges = np.histogram(x.flatten(), bins=range(256+1))
# average even-odd pairs
hbar = np.repeat((h[:-1:2] + h[1::2]) / 2, 2)
fig, ax = plt.subplots(1, 2, sharey=True)
ax[0].bar(range(256), h);
ax[1].bar(range(256), hbar);
Can we use $\chi^2$ test to detect steganography?
If the histogram is similar to the pair-averaged histogram, steganography is present.
$$S=\sum_{i=0}^{255}\frac{(h_i-\bar{h}_i)^2}{\bar{h}_i}$$
# Avoid division by zero
h = h[hbar > 0]
hbar = hbar[hbar > 0]
# Chi2 test
test_stat = np.sum((h - hbar)**2 / hbar)
pvalue = scipy.stats.chi2.sf(test_stat, 2**8-1)
pvalue # stego if >0.05
0.0
P-value is less than $5\%$, histograms are different. The image is cover.
We run the $\chi^2$-test for stego.
def chi2_attack(x):
# histograms
h = np.histogram(x.flatten(), bins=range(256+1))[0]
hbar = np.repeat((h[:-1:2] + h[1::2])/2, 2)
h, hbar = h[hbar > 0], hbar[hbar > 0]
# chi2 test
S = np.sum((h[:-1:2] - hbar[::2])**2 / hbar[::2])
return scipy.stats.chi2.sf(S, h.size-1)
y = lsbr(x, 1.) # create stego
pvalue = chi2_attack(y) # chi2 attack
pvalue # stego if >0.05
1.0
P-value is greater than $5\%$, histograms are the same. The image is stego.
Take-away messages¶
- Steganography distorts image statistics.
- $\chi^2$ test can detect the presence of LSB replacement steganography.
Hands-on: LSBr¶
- Find the image(s) with steganography among the suspicious images
- Try to extract the messages
- The secret key is the answer to life, the universe, and everything.
# load suspicious images
t21a = np.array(Image.open(f't21a.png'))
martinswand = np.array(Image.open(f'martinswand.png'))
statue = np.array(Image.open(f'statue.png'))
Run $\chi^2$ test on each image.
chi2_attack(t21a)
0.0
chi2_attack(statue)
0.0
chi2_attack(martinswand)
1.0
key = 42 # answer to life
message = extract_lsbr(martinswand, key=key)
print(message[:300], '...')
1609
THE SONNETS
by William Shakespeare
1
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed's ...
print(extract_lsbr(t21a)) # hidden Easter egg
Hi Verena! Sorry again that you lost the key from your bike. Let's meet today by the bike stand to cut the lock. Martin.